Veteran XML Engineer Reminisces on Challenges and Rewards

xml london 2013 xml veteran reminisces n.w
1 / 22
Embed
Share

Explore the insights of Tatu Saloranta, a veteran XML engineer, as he reflects on the challenges, frustrations, and rewards of creating tools for XML processing. Gain knowledge on XML, related data formats like JSON, and valuable lessons learned over the past decade.

  • XML
  • Engineer
  • Challenges
  • Rewards
  • Lessons

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. XML London 2013: XML Veteran Reminisces Tatu Saloranta Principal Engineer at Salesforce.com Open Source Activist (@cowtowncoder), author of Woodstox, Jackson

  2. "XML Veteran Reminisces" Creating tools for XML processing can be challenging, frustrating and rewarding -- often simultaneously. But you will learn a lot, beyond the data format itself or its immediate applicability. I will cover things learnt over past 10 years and share my views on subjects around XML, related textual data formats (JSON), esp. regarding things I consider over- /undervalued.

  3. Author and XML ("It's complicated") Programming since 80s: Basic -> machine code -> C -> C++ -> Java HTML/SGML/WWW/Linux, College, mid- 90s: o XML in 1998; QuarkXPress 5.0 web, XML, 1999 o Content Authoring systems at Sun, 2001-2004 o First XML parser, QnD (yes, quick-n-dirty) 2004: Woodstox XML parser (Java, Stax) o 2005: StaxMate ("perfect companion for Stax") 2005: joined Amazon.com o Multiple services that communicate over XML (PoX)

  4. 0. Disclaimers Talk about Data-oriented XML processing, not Document-oriented (aka "markup"): o XML works well, no real contest for doc-oriented o Haven't worked with Doc-oriented XML since 2005 (Sun, Content authoring systems) My knowledge of XML state-of-art is lacking: less involved since 2008 -- except for Async parsing (Aalto), Mobile (Android) Abstract, philosophical, high-level

  5. Overview: Theses 1. No One Cares If It Is XML 2. DOM is overused, overrated 3. Performance perception-based, overvalued 4. Schemas: to Use or Not to Use is the real Q 5. Complexity: the Real enemy 6. XML could be simpler; JSON perhaps too simple

  6. "No One Cares If It Is XML" ... nor should they, fundamentally, care a lot Implementation detail more than a Feature o Early fanboi-ism ("hey, X uses XML!") -- form over function -- Open/documented matters more Usage should drive design Common mistakes o Overfocus on formatting -- "Pretty Snowflake" XML Cumbersome to use, no tooling even if looks good on paper (in spec) o Assumption of using specific tools (XSLT) o No end-to-end view, lacks interoperability Would you eat their dog food if they don't? (Ning)

  7. "No One Cares If It Is XML" (2) XML Validation: useless, even harmful? validation at format level (XML validation) o neither necessary (handle at levels below & above) o nor sufficient (business logic limits) o often inconvenient compared to prog lang; or domain-specific system close coupling b/w biz logic, representation validate by processing (do you really trust 'it validates'?)

  8. "No One Cares If It Is XML" (3) So what DOES matter with XML (or Other Textual Data Formats -- *cough*JSON*cough*)? 1. Developer debuggable (text vs binary) 2. Physical vs Logical model separation a. modularity (process at right level of abstraction) b. alternate representations like Binary XML 3. Robustness via redundancy 4. Flexibility: schemas optional (or multiple schemas)

  9. "DOM is overused, overrated" Historically DOM (and other Tree Models) seen as a cornerstone of XML (web browser?). But it is JUST ONE of processing models adds lots of overhead: 3x mem (textual XML or native object), 2-10x processing overhead (vs streaming) o not always problematic (small data, browsers) is untyped, XML-centric, inconvenient and often WRONG tool for the job. Still taught as the canonical model?

  10. "DOM is overused, overrated" (2) Alternative to DOM as the foundational block: streaming (for Java, SAX, Stax) XSLT may be able to use (sometimes) XQuery can typically Data-binding (JAXB, Jackson XML module) can use streaming o Newer Java Web Services use data-binding Or use directly or via helper libs (StaxMate for Java)

  11. "DOM is overused, overrated" (3) Focus on higher-level abstractions, regardless of internal representation XQuery (or, XSLT if you must) Data-binding (native objects) ... other nice abstractions we'll learnt about later today? DOM (tree model) useful as a conceptual model: not so much as a tool.

  12. "Performance is perception-based, overvalued" Thesis: Nowadays DATA IN ANY FORMAT can be processed FAST ENOUGH! with proper model, tools (... not DOM) o ... or, even with wrong ones, with fast hardware... tool/implementation quality has big impact -- theoretical limits rarely matter cherry-picking, apples-to-oranges leads to claims like "protobuf 100x faster than XML" (XSLT or DOM?)

  13. "Performance is perception-based..." (2) So why do so many devs still want to "Use Protobuf/MsgPack/... to speed things up"? ... without verifying they HAVE A PROBLEM? Especially when binary formats: Are hard(er) to debug: not dev-debuggable without specific tools (or schema) Brittle: a minor truncation makes data unreadable, without knowing what/how/when Schema-ridden: lose schema, lose data Less interoperable (Javascript anyone?)

  14. "Performance is perception-based..." (3) It's all about perception! Everybody "knows" XML is Slow Except comparing best-of-breed tools or when you consider developer productivity and "Big, Bloated" except if you compress data; XML compresses better, especially longer streams while not equivalent (still more information, wrt names), much closer

  15. "Schemas: to Use or Not to Use?" Growing (ca 2008...) assumption of Inevitability of explicit and mandatory (XML)Schema usage. (& implicit assumption "schema always good") Is this true? Two things to consider: 1. Explicit (XML Schema) vs Implicit/Inferred schema (for data-binding) 2. Ability to NOT use Schema is a Feature: a. many binary formats require (not all) b. Schemas make data brittle, rigid: free-

  16. "Schemas: to Use or Not to Use? (2)" Historical note: Schemas came from Validation (XSD as DTD replacement) o But most value for XML Schema comes from Schema as Type Definition! Ironically enough, RelaxNG great for validation, not very useful for Type Definitions o same true for "other schemas", I presume (Schematron etc) o put another way: RNG for Document/markup- oriented usage; XSD for data

  17. "Schemas: to Use or Not to Use? (3)" Alternatives to explicit, external schema, in context of Type Definitions Usage: if I access thing as Number, it is (or becomes) one (xslt, xquery) -- dynamic typing Other artifacts ("things you need anyway") o Object definitions like Java classes (databind, JAXB) o Protocol definitions (IDLs), with mapping rules May generate schema; caveat: O/X impedance

  18. "Complexity: the Real enemy" Perception: XML haters fully appreciate complexity of XML processing! but XML "drivers" seem(ed) unconcerned o "serious systems are complex, complicated, hard" o "they just don't yet understand the beauty" o "more expressive power (always) good" perception becomes a fact ("it is if it appears to") wrong tool for the job: XSL for business- logic, validation (Amazon horror stories)

  19. "Complexity: the Real enemy (2)" Fighting for simplicity: Modularity of XML applications o XPath from XSL o Small specifications (XML id) o Incremental improvements Accept "non-XML" tools (no biz-logic in XSL) -> interoperability Borrow from others (SQL); interoperate Declarative good? Often yes, not always ("extend until implement Emacs, imperfectly")

  20. "XML could be simpler; JSON too simple" JSON arguably IS simpler for Data: Small set of real scalar types (numbers, booleans), not just Strings True Map/Object vs Array/List distinction No mixed content (thank god) No Element/Attribute distinction Simpler parsing allows more effort in building better data-binding tools o and experience from XML tools also helps...

  21. "XML could be simpler; JSON too simple" (2) But could it be too simple? Alas, yes: No comments (you kidding me, right?) -- was included initially, removed due to alleged abuse Data/metadata (hint: element/attribute!) distinction actually USEFUL for: o type information (host language Object types) o object identity information Tools need to rely exclusively on naming conventions

  22. The End And That's All, Folks! Comments, questions, feedback: tsaloranta@gmail.com OR @cowtowncoder

Related


More Related Content