Resolves: tdf#125110 tdf#151211 Disentangle the convoluted CSV/TSV-clip import

The chain of fixes for #i119960# tdf#48621 tdf#125440 produced
code that is suboptimal and not robust enough against some further
corner cases, taking quoted field content where there shouldn't
be.

First, in ReadCsvLine() assume that if a generator is broken
enough to start a field quoted followed by containing an unescaped
embedded quote and there is no closing quote (i.e. immediately
before a field delimiter) until the line end then the generator
will not be clever enough to write embedded linefeeds either and
the field starting quote wasn't one but to be taken literally as
all other quotes until the now unquoted field end. In this case do
not read a subsequent source line for the current row.

Then, for individual fields of a row make a similar assumption, a
quote-started field has to end with a quote before a field
separator (or line end) or otherwise all quotes of that field are
literal data up to the next field separator.

This made it necessary to adapt two test cases of the garbage CSV
import test to produce different garbage than before.

Change-Id: I4424b65c87c7f9dcbe717a7e6cf207352cb613f3
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/140850
Reviewed-by: Eike Rathke <erack@redhat.com>
Tested-by: Jenkins
2 files changed
tree: 3859564c8389cc391a6f06c1e9bb04e76a1c7c44
  1. .git-hooks/
  2. .github/
  3. .vscode/
  4. accessibility/
  5. android/
  6. animations/
  7. apple_remote/
  8. avmedia/
  9. basctl/
  10. basegfx/
  11. basic/
  12. bean/
  13. bin/
  14. binaryurp/
  15. bridges/
  16. canvas/
  17. chart2/
  18. cli_ure/
  19. codemaker/
  20. comphelper/
  21. compilerplugins/
  22. config_host/
  23. configmgr/
  24. connectivity/
  25. cppcanvas/
  26. cppu/
  27. cppuhelper/
  28. cpputools/
  29. cui/
  30. dbaccess/
  31. desktop/
  32. distro-configs/
  33. drawinglayer/
  34. editeng/
  35. embeddedobj/
  36. embedserv/
  37. emfio/
  38. eventattacher/
  39. extensions/
  40. external/
  41. extras/
  42. filter/
  43. forms/
  44. formula/
  45. fpicker/
  46. framework/
  47. helpcompiler/
  48. hwpfilter/
  49. i18nlangtag/
  50. i18npool/
  51. i18nutil/
  52. icon-themes/
  53. idl/
  54. idlc/
  55. include/
  56. instsetoo_native/
  57. io/
  58. ios/
  59. javaunohelper/
  60. jurt/
  61. jvmaccess/
  62. jvmfwk/
  63. l10ntools/
  64. librelogo/
  65. libreofficekit/
  66. lingucomponent/
  67. linguistic/
  68. lotuswordpro/
  69. m4/
  70. nlpsolver/
  71. o3tl/
  72. odk/
  73. offapi/
  74. officecfg/
  75. onlineupdate/
  76. oovbaapi/
  77. oox/
  78. opencl/
  79. osx/
  80. package/
  81. pch/
  82. postprocess/
  83. pyuno/
  84. qadevOOo/
  85. readlicense_oo/
  86. registry/
  87. remotebridges/
  88. reportbuilder/
  89. reportdesign/
  90. ridljar/
  91. sal/
  92. salhelper/
  93. sax/
  94. sc/
  95. scaddins/
  96. sccomp/
  97. schema/
  98. scp2/
  99. scripting/
  100. sd/
  101. sdext/
  102. setup_native/
  103. sfx2/
  104. shell/
  105. slideshow/
  106. smoketest/
  107. solenv/
  108. soltools/
  109. sot/
  110. starmath/
  111. static/
  112. stoc/
  113. store/
  114. svgio/
  115. svl/
  116. svtools/
  117. svx/
  118. sw/
  119. swext/
  120. sysui/
  121. test/
  122. testtools/
  123. toolkit/
  124. tools/
  125. ucb/
  126. ucbhelper/
  127. udkapi/
  128. uitest/
  129. UnoControls/
  130. unodevtools/
  131. unoidl/
  132. unoil/
  133. unotest/
  134. unotools/
  135. unoxml/
  136. ure/
  137. uui/
  138. vbahelper/
  139. vcl/
  140. winaccessibility/
  141. wizards/
  142. writerfilter/
  143. writerperfect/
  144. xmerge/
  145. xmlhelp/
  146. xmloff/
  147. xmlreader/
  148. xmlscript/
  149. xmlsecurity/
  150. .buckconfig
  151. .buckversion
  152. .clang-format
  153. .editorconfig
  154. .git-blame-ignore-revs
  155. .gitattributes
  156. .gitignore
  157. .gitmodules
  158. .gitpod.dockerfile
  159. .gitpod.yml
  160. .gitreview
  161. antivirusDetection.vbs
  162. autogen.sh
  163. BUCK
  164. config.guess
  165. config.sub
  166. config_host.mk.in
  167. config_host_lang.mk.in
  168. configure.ac
  169. COPYING
  170. COPYING.LGPL
  171. COPYING.MPL
  172. cpp.hint
  173. download.lst
  174. g
  175. hardened_runtime.xcent.in
  176. install-sh
  177. install_deps.sh
  178. leak-suppress.txt
  179. Library_merged.mk
  180. lo.xcent.in
  181. logerrit
  182. Makefile.fetch
  183. Makefile.gbuild
  184. Makefile.in
  185. README.cross
  186. README.help.md
  187. README.md
  188. README.Solaris
  189. Repository.mk
  190. RepositoryExternal.mk
  191. RepositoryFixes.mk
  192. RepositoryModule_build.mk
  193. RepositoryModule_host.mk
  194. sanitize-ubsan-excludelist
  195. setup.cfg
  196. TEMPLATE.SOURCECODE.HEADER
  197. tsan-suppress.txt
README.md

LibreOffice

Coverity Scan Build Status CII Best Practices Translation status

LibreOffice is an integrated office suite based on copyleft licenses and compatible with most document formats and standards. Libreoffice is backed by The Document Foundation, which represents a large independent community of enterprises, developers and other volunteers moved by the common goal of bringing to the market the best software for personal productivity. LibreOffice is open source, and free to download, use and distribute.

A quick overview of the LibreOffice code structure.

Overview

You can develop for LibreOffice in one of two ways, one recommended and one much less so. First the somewhat less recommended way: it is possible to use the SDK to develop an extension, for which you can read the API docs and Developers Guide. This re-uses the (extremely generic) UNO APIs that are also used by macro scripting in StarBasic.

The best way to add a generally useful feature to LibreOffice is to work on the code base however. Overall this way makes it easier to compile and build your code, it avoids any arbitrary limitations of our scripting APIs, and in general is far more simple and intuitive - if you are a reasonably able C++ programmer.

The Build Chain and Runtime Baselines

These are the current minimal operating system and compiler versions to run and compile LibreOffice, also used by the TDF builds:

  • Windows:
    • Runtime: Windows 7
    • Build: Cygwin + Visual Studio 2019 version 16.10
  • macOS:
    • Runtime: 10.14
    • Build: 11.0 + Xcode 12.5
  • Linux:
    • Runtime: RHEL 7 or CentOS 7
    • Build: either GCC 7.0.0; or Clang 8.0.1 with libstdc++ 7.3.0
  • iOS (only for LibreOfficeKit):
    • Runtime: 11.4 (only support for newer i devices == 64 bit)
    • Build: Xcode 9.3 and iPhone SDK 11.4
  • Android:
    • Build: NDK r19c and SDK 22.6.2
  • Emscripten / WASM:
    • Runtime: a browser with SharedMemory support (threads + atomics)
    • Build: Qt 5.15 with Qt supported Emscripten 1.39.8
    • See README.wasm

Java is required for building many parts of LibreOffice. In TDF Wiki article Development/Java, the exact modules that depend on Java are listed.

The baseline for Java is Java Development Kit (JDK) Version 11 or later. It is possible to build LibreOffice with JDK version 9, but it is no longer supported by the JDK vendors, thus it should be avoided.

If you want to use Clang with the LibreOffice compiler plugins, the minimal version of Clang is 12.0.1. Since Xcode doesn't provide the compiler plugin headers, you have to compile your own Clang to use them on macOS.

You can find the TDF configure switches in the distro-configs/ directory.

To setup your initial build environment on Windows and macOS, we provide the LibreOffice Development Environment (LODE) scripts.

For more information see the build instructions for your platform in the TDF wiki.

The Important Bits of Code

Each module should have a README.md file inside it which has some degree of documentation for that module; patches are most welcome to improve those. We have those turned into a web page here:

https://docs.libreoffice.org/

However, there are two hundred modules, many of them of only peripheral interest for a specialist audience. So - where is the good stuff, the code that is most useful. Here is a quick overview of the most important ones:

ModuleDescription
sal/this provides a simple System Abstraction Layer
tools/this provides basic internal types: Rectangle, Color etc.
vcl/this is the widget toolkit library and one rendering abstraction
framework/UNO framework, responsible for building toolbars, menus, status bars, and the chrome around the document using widgets from VCL, and XML descriptions from /uiconfig/ files
sfx2/legacy core framework used by Writer/Calc/Draw: document model / load/save / signals for actions etc.
svx/drawing model related helper code, including much of Draw/Impress

Then applications

ModuleDescription
desktop/this is where the main() for the application lives, init / bootstrap. the name dates back to an ancient StarOffice that also drew a desktop
sw/Writer
sc/Calc
sd/Draw / Impress

There are several other libraries that are helpful from a graphical perspective:

ModuleDescription
basegfx/algorithms and data-types for graphics as used in the canvas
canvas/new (UNO) canvas rendering model with various backends
cppcanvas/C++ helper classes for using the UNO canvas
drawinglayer/View code to render drawable objects and break them down into primitives we can render more easily.

Rules for #include Directives (C/C++)

Use the "..." form if and only if the included file is found next to the including file. Otherwise, use the <...> form. (For further details, see the mail Re: C[++]: Normalizing include syntax ("" vs <>).)

The UNO API include files should consistently use double quotes, for the benefit of external users of this API.

loplugin:includeform (compilerplugins/clang/includeform.cxx) enforces these rules.

Finding Out More

Beyond this, you can read the README.md files, send us patches, ask on the mailing list libreoffice@lists.freedesktop.org (no subscription required) or poke people on IRC #libreoffice-dev on irc.libera.chat - we're a friendly and generally helpful mob. We know the code can be hard to get into at first, and so there are no silly questions.