Skip to main content

Change spellchecking to hunspell in TinyMCE

In the last few years, I have had issues with application servers using a large amount of CPU and even hanging application servers running the Tiny Spellchecking service. It ended with disabled spellchecking in the Tiny Editors’ config.js.

SharedDirectory/customization/javascript/tiny/editors/connections/config.js

...
// Set to false to disable Tiny's spell checking service in TinyMCE and Textbox.io.
spellingServiceEnabled: false,
...
Note

I worked with HCL and Tiny Support on these issues, and they provided updates during the last year. This should have been fixed since TinyMCE 5.9.

Now, after updating to the actual editor version, TinyMCE 5.10.2, we decided to re-enable the spellchecker, and for the first few days it looked like the issue was really resolved. Sadly, after about a week, the first application server started to use 800% CPU just for the server hosting the spelling service.

In the application server logs, we found messages like:

So first, we see debug messages without enabling a trace, and on the top of the image, we see that a request ran over 1000 ms.

Support sent me the steps to disable the debug messages:

  1. Create a file called /opt/ephox/logback.xml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
   <appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
       <target>System.out</target>
       <encoder>
           <pattern>%date{yyyy-MM-dd HH:mm:ss.SSSX} [%thread] %-5level %logger{36} - %msg%n</pattern>
       </encoder>
   </appender>
   <logger name="ironbark" level="WARN"/>
   <root level="INFO">
       <appender-ref ref="CONSOLE"/>
   </root>
</configuration>

Important is line 9, which is set to DEBUG for TinyMCE 5.10.2, but WARN or ERROR will prevent these log messages.

  1. Add a custom JVM property (Server > Server Types > WebSphere Application Servers → server name > Process Definition > Java Virtual Machine > Custom Properties) to the application server where you installed the spellchecker.
logback.configurationFile: /opt/ephox/logback.xml

After this, the performance was slightly better, but still not good.

Today, I got the following update from Tiny:

Broadly, we believe that WinterTree spelling library is having problems with long words with possible hyphens, especially in German. In this case, we recommending trying the Hunspell library instead.

We can see that the problem language is always German, and the number of characters is higher than 20. Due to implementation aspects with how WinterTree’s spelling engine works, these cases can be particularly problematic.

The most egregious offender is:

Took 25270 milliseconds.

Which meant that it took over 25 seconds to generate suggestions for 1 word in a document. As you can imagine, when this starts happening, sending lots of words becomes a problem. However, there aren’t many words that take more than 1 second to generate, because this is the entire list in the logs sent to us.

In general, you could likely avoid this behavior by using Hunspell libraries, particularly for German. Here is our documentation about adding Hunspell dictionaries to Spellchecker Pro. You likely have specific separate instructions for setting up Hunspell, but it will be effectively the same under the hood, as it’s a server-only setting.

https://www.tiny.cloud/docs/tinymce/6/self-hosting-hunspell/

Tiny/HCL Support

So here I could stop and point you to support, but I have had some issues during the activation of Hunspell so far.

First, the webpage says, “Tiny provides two downloadable bundles of Hunspell dictionaries,” which I couldn’t find. So I searched for other download options. The best match were the dictionaries included with LibreOffice : https://github.com/libreoffice/dictionaries, but the folder structure and naming do not match the one requested by Tiny.

#!/usr/bin/env bash

git clone https://github.com/LibreOffice/dictionaries.git /tmp/dictionaries
for i in af_ZA da de_DE en_AU en_CA en_GB en_US es fr hu it_IT nb_NO nl_NL nn pl pt_BR pt_PT sv_FI sv_SE ; do
  mkdir -p /opt/ephox/hunspell-dictionaries/$i
  find /tmp/dictionaries -iname $i*.aff -exec cp {} /opt/ephox/hunspell-dictionaries/$i/$i.aff \;
  find /tmp/dictionaries -iname $i*.dic -exec cp {} /opt/ephox/hunspell-dictionaries/$i/$i.dic \;
done

This script creates the expected folder structure and copies the dictionaries to the right place.

tree /opt/ephox/hunspell-dictionaries/
/opt/ephox/hunspell-dictionaries/
├── af_ZA
│   ├── af_ZA.aff
│   └── af_ZA.dic
├── da
│   ├── da.aff
│   └── da.dic
├── de_DE
│   ├── de_DE.aff
│   └── de_DE.dic
├── en_AU
│   ├── en_AU.aff
│   └── en_AU.dic
├── en_CA
│   ├── en_CA.aff
│   └── en_CA.dic
├── en_GB
│   ├── en_GB.aff
│   └── en_GB.dic
├── en_US
│   ├── en_US.aff
│   └── en_US.dic
├── es
│   ├── es.aff
│   └── es.dic
├── fr
│   ├── fr.aff
│   └── fr.dic
├── hu
│   ├── hu.aff
│   └── hu.dic
├── it_IT
│   ├── it_IT.aff
│   └── it_IT.dic
├── nb_NO
│   ├── nb_NO.aff
│   └── nb_NO.dic
├── nl_NL
│   ├── nl_NL.aff
│   └── nl_NL.dic
├── nn
│   ├── nn.aff
│   └── nn.dic
├── pl
│   ├── pl.aff
│   └── pl.dic
├── pt_BR
│   ├── pt_BR.aff
│   └── pt_BR.dic
├── pt_PT
│   ├── pt_PT.aff
│   └── pt_PT.dic
├── sv_FI
│   ├── sv_FI.aff
│   └── sv_FI.dic
└── sv_SE
    ├── sv_SE.aff
    └── sv_SE.dic

19 directories, 38 files

Now we have to enable the Hunspell-dictionaries in /opt/ephox/application.conf and restart the spellchecking service.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cat /opt/ephox/application.conf
ephox {
	allowed-origins {
		origins = [
			"http://cnx7-rh8-was.stoeps.home",
			"https://cnx7-rh8-was.stoeps.home",
			"https://cnx7-rh8.stoeps.home",
			"http://cnx7-rh8-was.stoeps.home:9081",
			"https://cnx7-rh8-was.stoeps.home:9444"
		]
	}
	spelling {
		hunspell-dictionaries-path = "/opt/ephox/hunspell-dictionaries"
	}
}

Don’t forget to enable spell checking in SharedDirectory/customization/javascript/tiny/editors/connections/config.js

...
// Set to false to disable Tiny's spell checking service in TinyMCE and Textbox.io.
spellingServiceEnabled: true,
...

Results
#

I tested with WinterTree (default) and Hunspell.

Testing some long words with WinterTree
#

[7/12/22 17:35:35:152 UTC] 00000132 SystemOut     O 2022-07-12 17:35:35.152Z [ioapp-compute-1] INFO  ironbark - request [ uuid-47ac0625-f6dc-4876-8127-59b50595cd0f ] Response => Status: 200 OK (12 ms)
[7/12/22 17:35:35:212 UTC] 00000139 SystemOut     O 2022-07-12 17:35:35.212Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (100.0 % - 1 / 1 incorrect)
[7/12/22 17:35:35:212 UTC] 00000139 SystemOut     O 2022-07-12 17:35:35.212Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (1 words) (BEGIN)
[7/12/22 17:35:38:865 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.865Z [ioapp-compute-4] WARN  ironbark -

          request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] PERFORMANCE_ALERT: word took longer than 1000 milliseconds. Took 3652 milliseconds.

          * Language: de
          * Number of characters: 48
          * Number of hyphens: 0
          * Number of apostrophes: 0
          * Number of suggestions generated: 16


[7/12/22 17:35:38:865 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.865Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-ac3ac5bf-eb12-4e72-a98b-a9c93f288093 ] Spellall (1 words) (END)
[7/12/22 17:35:38:866 UTC] 00000139 SystemOut     O 2022-07-12 17:35:38.866Z [ioapp-compute-4] INFO  ironbark - request [ uuid-9347efc7-7705-4bcb-911c-1506d1d3b90a ] Response => Status: 200 OK (3726 ms)

We see the request needs 3.6 seconds and the word was 48 characters long.

Testing the same with Hunspell enabled
#

[7/12/22 20:10:12:798 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.798Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-0958072a-bf4c-4cb6-8acd-e2e7e8fb2870 ] Spellall (7 words) (BEGIN)
[7/12/22 20:10:12:798 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.798Z [ioapp-compute-4] DEBUG ironbark - request [ uuid-0958072a-bf4c-4cb6-8acd-e2e7e8fb2870 ] Spellall (7 words) (END)
[7/12/22 20:10:12:800 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.800Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:801 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.800Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:801 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-199c0261-36e8-4173-807a-13a4a8ebce6b ] Spellall (1 words) (END)
[7/12/22 20:10:12:801 UTC] 00000134 SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-4] INFO  ironbark - request [ uuid-4655f8a9-a466-4ad4-8874-d91f5fc8fc9b ] Response => Status: 200 OK (18 ms)
[7/12/22 20:10:12:801 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.801Z [ioapp-compute-1] INFO  ironbark - request [ uuid-9117e582-90f5-4246-bd17-56d00c12b975 ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:803 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.803Z [ioapp-compute-2] INFO  ironbark - request [ uuid-938f2c71-701b-4312-9183-426b81829297 ] Response => Status: 200 OK (16 ms)
[7/12/22 20:10:12:803 UTC] 00000133 SystemOut     O 2022-07-12 20:10:12.803Z [ioapp-compute-3] INFO  ironbark - request [ uuid-67b5ef69-2079-43eb-908f-bd0017f715e2 ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:808 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.808Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:811 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:811 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:812 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.811Z [ioapp-compute-1] DEBUG ironbark - request [ uuid-202c5596-0fae-4cc4-8793-7feb458b3b0c ] Spellall (1 words) (END)
[7/12/22 20:10:12:814 UTC] 0000012f SystemOut     O 2022-07-12 20:10:12.814Z [ioapp-compute-1] INFO  ironbark - request [ uuid-9117e582-90f5-4246-bd17-56d00c12b975 ] Response => Status: 200 OK (13 ms)
[7/12/22 20:10:12:817 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.817Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:819 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.819Z [ioapp-compute-2] DEBUG ironbark - request [ uuid-8f7cc6c4-60f7-4535-ad4a-04a49aa4b389 ] Spellall (1 words) (END)
[7/12/22 20:10:12:822 UTC] 00000132 SystemOut     O 2022-07-12 20:10:12.821Z [ioapp-compute-2] INFO  ironbark - request [ uuid-67b5ef69-2079-43eb-908f-bd0017f715e2 ] Response => Status: 200 OK (18 ms)
[7/12/22 20:10:12:854 UTC] 00000133 SystemOut     O 2022-07-12 20:10:12.854Z [ioapp-compute-3] INFO  ironbark - request [ uuid-0b4d740e-a06b-4a1a-a75e-e6a680a2d41d ] Request => POST /tiny-spelling/2/suggestions
[7/12/22 20:10:12:860 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.860Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Incoming suggestions-V2 request for: 1 word(s) in language: de from API Key: none
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (0.0 % - 0 / 1 incorrect)
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (1 words) (BEGIN)
[7/12/22 20:10:12:862 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.862Z [ioapp-compute-5] DEBUG ironbark - request [ uuid-15c6b13d-a3f7-4fbb-8b43-6bf6a6074b26 ] Spellall (1 words) (END)
[7/12/22 20:10:12:864 UTC] 00000135 SystemOut     O 2022-07-12 20:10:12.864Z [ioapp-compute-5] INFO  ironbark - request [ uuid-0b4d740e-a06b-4a1a-a75e-e6a680a2d41d ] Response => Status: 200 OK (10 ms)

So for German spellchecking, it appears that Hunspell is working faster and giving suggestions even for long words. No, high CPU or waiting message has appeared so far. I never thought about these long German words until I read the answer from Tiny Support. When your users write documents in Connections in German, I would suggest you change the spellchecker too.

Christoph Stoettner
Author
Christoph Stoettner
I work at Vegard IT GmbH as a senior consultant, focusing on collaboration software, Kubernetes, security, and automation. I primarily work with HCL Connections, WebSphere Application Server, Kubernetes, Ansible, Terraform, and Linux. My daily work occasionally leads to technical talks and blog articles, which I share here more or less regularly.

Related

Fix some annoyances with Customizer

I created a git repository with some smaller CSS files to fix some annoyances within HCL Connections. I started with this to prevent Orient Me to load fonts from external URLs or Elasticsearch Metrics to break the UI on larger screens. These issues are solved after the last updates I got from support, but Blogs and Tailored Experience Wizard can be improved with some simple rules.

Dachnug49 in Constance

·232 words·2 mins
The annual conference of DNUG took place in Constance from 22nd to 23rd of June 2022. I attended the HCL Connections Roadmap session given by Rene Schimmer and David Strachan. They showed the updates for version 8 and beyond.