US 20120008864A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2012/0008864 A1 Kanatsu et al. (43) Pub. Date: Jan. 12, 2012 (54) IMAGE PROCESSING APPARATUS, IMAGE Publication Classi?cation PROCESSING METHOD, AND COMPUTER (51) Int. Cl. READABLE MEDIUM G06K 9/34 (2006.01) (52) US. Cl. ...................................................... .. 382/176 (75) Inventors: Tomotoshi Kanatsu, Tokyo (JP); (57) ABSTRACT Hidetomo Sohma, Yokohama-shi (JP); Reiji MisaWa, Tokyo (JP); An apparatus comprises: unit con?gured to divide input docu Ryo Kosaka, Tokyo (JP) ment data into a body region, a caption region, and an object region; unit con?gured to acquire text information included in each of the body region and the caption region; unit con?g (73) Assignee: CANON KABUSHIKI KAISHA, ured to search the text information in the body region for an Tokyo (JP) anchor term, to extract an anchor term from the text informa tion in the caption region, and to generate a bi-directional link (21) Appl. No.: 13/162,266 between a portion corresponding to the anchor term in the body region and a portion of the object region to Which the caption region is appended; and unit con?gured to convert the (22) Filed: Jun. 16, 2011 input document data into digital document data in Which the portion corresponding to the anchor term in the body region (30) Foreign Application Priority Data and the portion corresponding to the object region to Which the caption region is appended are bi-directionally linked Jul. 6, 2010 (JP) ............................... .. 2010-154361 based on the link. : _________________ n: 1010» 4i ""Y" "7"- "" l i qEl?E’??/iihu : : iAs SHOWN IN -1014 a (LELQUBEJD a‘ ‘1009 a J _| a E i ::::::::::::::::-Jl i SALES OF LAST 5 1011" ’E|:|GURE1AAAi 1 YEAR“- 5 """ "5":- -------' : "-,_\ ‘1012 5 WHICH SHOWS $1017 5 BBB TRANSITION : AFTER XX-TH i YEAH... 1b08 [email protected].=" "1°13 —1016 8 8 1001 1002 Patent Application Publication Jan. 12, 2012 Sheet 1 0f 35 US 2012/0008864 A1 OFFICEA ‘ 103 3 104 PROXY 8 1(30 sERvER NETWORK MFP 4E] 101 Z CLIENTPC 4E] ~102 LAN Patent Application Publication Jan. 12, 2012 Sheet 3 0f 35 US 2012/0008864 A1 95 Er3 wmw8mm 5wSww55ooo mmwww vE“MES52SEQ2/:EE2E0 NMNNwV :::7:57777:555552 < f\ tzboo2zz2o0a0iEw65zmm5Eoo5o5owo20mm7lw0m: mza wétwa M=<=<= 222590025533335558227;;: w££22w06%m, 7mvQ@6<26_B>m:_m_ nAan Patent Application Publication Jan. 12, 2012 Sheet 4 0f 35 US 2012/0008864 A1 FIG. 4 304 LINK OBJECT A’ 401 II I SELECTION UNIT ANCHOR TERM /'\_/ 402 407 = = IN CAPTION @ ExTRACT|ON UNIT LINK PROCESSING ~ = ANCHOR TERN| A, CONTROL UNIT : : IN BODY 403 SEARCH UNIT FIELD “1 404 : CENERAT|ON UNIT LINK CONFIGURATION /'\_/ 405 INFORMATION II GENERATION UNIT E _ ACTIONDEFINITION A1406 CENERAT|ON UNIT N211 FIELD LINK CONFIGURATION DEFINITION MANAGEMENTTABLE Av308 \ \ / / Patent Application Publication Jan. 12, 2012 Sheet 5 0f 35 US 2012/0008864 A1 FIG. 5A 500 501 2 LSAMELEJ """""""""""" ";v-~~-~502 5O4~--5'-'35"""'515'§5'}g':"":'i ,- ---------- 5 5PBOBLIOTEE= ------------------- ---- --J5iF|GURE1... p506 503M 5 5 5 RELATED To; 5 ? :iAAA. ........... 505/- 53550535535555 FIG. 5B APPENDED OBBI- OBOI- WIDTH HEIGHT ATTBIBuTE WITH NATE NATE W H PAGE INPOBNIATION NAME CAPTION x Y 501 HEADING - x1 Y1 W1 H1 1 SAMPLE HREEAGDIQNG TABLE 502 TABLE - x2 Y2 W2 H2 1 REGION CAPTION 50s PHOTO - x3 Y3 W3 H3 1 APPENDED REGION TEXT NEW TEXT IN 505) REGION FIGUREI CAPTION 505 CAPTION 505 X5 Y5 W5 H5 1 AAA REGION ...F|GURE1... BODY 505 BODY - x5 Y5 W5 H5 1 RELATED TO REGION AAA. ATTBIBLITE TEXT PHOTO LINE IMAGE TABLE coNvEBsIoN YEOTOB IMAGE VECTOR vEcTOB oONvEBsION PBOOEssINO OONvEBsION CLIPPING OONvEBsION (TABLE FBANIE POBTION) EBAsE PROCESSING ON ON ON ON Patent Application Publication Jan. 12, 2012 Sheet 7 0f 35 US 2012/0008864 A1 FIG. 7 I START I II INITIALIZE LINK INFORMATION II DIVIDE INTO REOIONs “S702 I! ADD ATTRIBUTE INFORMATION “5703 II CHARACTER RECOGNITION “3704 II LINK PROCESSING [TO FIG. 8] II FORIvIAT CONVERSION ~S7O6 II SEND (FOR ONE PACE) “S707 S708 NO YES SEND (ACTION DEFINITION) “3709 II END Patent Application Publication Jan. 12, 2012 Sheet 8 0f 35 US 2012/0008864 A1 START F S801 TEXT REGION YES TO BE PROCESSED FOUND? S802 ANCHOR TERM EXTRACTED? PROCESSING FOR TEXT = YES S803 REGION < I N GEN ERATE FIELD FOR ANCHOR TERM IN BODY REGION S804 UPDATE PROCESSING OF LINK CONFIGUR_I|A(T)'I(F)|I\IGINFORMATION [ ' 9 1 S805 I N GENERATE FIELD OF BUTTON APPENDED TO ANCHOR TERM IN \ BODY REGION S806 YES ANCHOR TERM TO BE PROCESSED FOUND? NO S807 CAPTION PROCESSING APPENDED REGION YES TO BE PROCESSED FOR CAPTION S808 FOUND? APPENDED < REGION ANCHOR TERM EXT RACTED? GENERATE FIELD FOR CAPTION APPENDED REGION S810 I r" UPDATE PROCESSING OF LINK CONFIGURATION INFORMATION L [TO FIG. 98] I END OF ALL REGIONS [TO STEP S705 IN FIG. 7] Patent Application Publication Jan. 12, 2012 Sheet 9 0f 35 US 2012/0008864 A1 FIG. 9A FIG. 95 START START [FROM STEP S804 IN FIG. 8] [FROM STEP S810 IN FIG. 8] S901 S911 DATA ROW DATA ROW YES INCLUDING YES INCLUDING SAME ANCHOR TERM SAME ANCHOR TERM IN BODY FOUND IN BODY FOUND ? ? S912 3 S902 ADD NEW DATA ROW TO DATA ROW LINK CONFIGURATION INCLUDING MANAGEMENT TABLE YES SAME ANCHOR TERM IN CAPTION FOUND = S913 3 '2 DESCRIBE CHARACTER S9803 STRING OF ANCHOR TERM IN "ANCHOR TERM IN CAPTION" ITEM ADD NEW DATA ROW TO S914 LINK CONFIGURATION 3 I MANAGEMENT TABLE ADD TO "CAPTION APPENDED REGION FIELD IDENTIFIER" ITEM 7 S904 DESCRIBE CHARACTER I STRING OF ANCHOR TERM IN END ANCHOR TERM IN BODY COLUMN [RETURN TO STEP S810 IN FIG. 8] 7 S905 INCREMENT ANCHOR TERM IN BODY APPEARANCE COUNT BY +1 8906 v 3 ADD TO ANCHOR TERIVI IN BODY REGION FIELD IDENTIFIER COLUMN END [RETURN TO STEP S804 IN FIG. 8]
Description: