News from CTK

ČTK expands automated generation of news – petrol prices soon to be followed by traffic accident statistic

17.06.2020

The ČTK has started using automatically generated texts in its news reports on petrol prices these days, and it is going to automatise also its news coverage of monthly statistics of traffic accidents at the turn of July. The automation relieves editors from routine work, speeds up the writing of news and reduces the risk of errors in data transcription, Radka Matesová Marková, the editor-in-chief of the ČTK news desk, said at the Journalists’ Forum in Prague-Karlín today.

The ČTK applied automated generation of news for the first time during the local and Senate elections in 2018, releasing 200 such stories. The national news agency thus became a pioneer in robotic journalism among Czech media. The agency develops the tools for automated generation of texts in its IT department in cooperation with editors. It also collaborates with the Geneea Analytics company.

All of the ČTK’s automatically generated texts are checked and, if need be, further edited by editors. The automatically generated news in the ČTK’s news service carry the signature "rur" after the famous play by Karel Čapek in which the word "robot" was used for the first time.

"The first petrol price test showed that thanks to the automated generation of texts we can release the entire news series about 40 minutes faster than usual. If the system turns out as reliable and we resolve to release the first versions of these news reports automatically, ČTK clients should receive them as soon as they are published on the CCS website. They will thus receive the entire series two hours sooner than they have so far," said the editor-in-chief. Robotic journalism in the environment of a news agency means a system that is able to generate text news in the ordinary language based on data available from a data source (e.g. the Czech Statistical Office) or from a combination of several sources. Such news reports can be released automatically or can be assigned to an editor for editing.

"Regarding news reports on automotive fuel prices, our idea for the foreseeable future is that they could be the first example of automatically generated news that would come out in their basic version without being edited by an editor," Matesová Marková said, adding that an editor would then work on the extended version of the story. The decision on releasing news without a check by an editor will be crucial and will depend on long-lasting testing of the system reliability in live operation, she said. In the future, automatically generated news could also be used for regularly published statistical data, such as the development of GDP, inflation, tourism, industry and construction.

The ČTK also used automatically generated texts internally during the first weeks of the coronavirus crisis this year. In cooperation with editors, the ČTK’s IT department prepared simple templates for headline and flash news coverage using statistical data from the Health Ministry’s website.

"In this case, we used the rur more as a notification. With regard to the dynamics of development, news could not be always written in the same way, so the template character soon turned out as inappropriate," said the editor-in-chief. The automated processing of the Health Ministry’s data was nevertheless of great help to editors, as brief template texts generated by the programme whenever the data changed were transferred to the ČTK’s editorial system, where editors could process them further.

Several months ago, ČTK editors also started testing a system for automated generation of news reports from the Prague Stock Exchange developed within a research project of the Charles University Faculty of Social Sciences, the Czech Technical University in Prague and the University of West Bohemia. “On our part, the work was halted by the coronavirus, but we will return to it,” said Matesová Marková.

Apart from the creation of simple templates, the ČTK has also become involved in the development of advanced generation on data based on statistical text processing. However, IT specialists often struggle with the complexity of the Czech language. “Czech is not only a complex language but also a small language, so there is often a shortage of suitable texts for machine learning. Even the content of the ČTK Infobank is not sufficient,” the editor-in-chief noted.