Introduction to Scala: XML Processing and Beyond
Scala is a powerful programming language utilized by major companies like LinkedIn, Apple, and Twitter. This presentation delves into XML processing and the diverse applications of Scala, showcasing its functionalities, modular nature, and remarkable performance. Explore the world of Scala, its uses, benefits, and why developers are choosing it for their projects. Discover the concise, functional, and type-safe features that make Scala an ideal choice for robust and efficient software development projects.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
XML Processing in William Narmontas Dino Fancellu www.scala.contractors XML LONDON 2014
Dino Fancellu 35 years IT Scala Java XML William Narmontas 10 years IT Scala XML Web
Modular Concise Functional Type-safe Performant Object-oriented Strongly-typed Statically-typed Unopinionated Composable Java-interoperable First-class XML
Who uses Scala? LinkedIn The Guardian Apple eBay Morgan Stanley TomTom Bank of America eHarmony Netflix Trafigura Barclays EDF Novell Tumblr BBC FourSquare Rackspace Twitter BSkyB Gawker Sky UBS Cisco HSBC Sony VMware Citigroup ITV Springer Xerox Credit Suisse Klout
Projects in Scala - Less code to write = less to maintain - Communication clearer - Testing easier - Software robust - Time to market: fast - Happier developers
Values Scala val conferenceName = "XML London 2014" XQuery let $conferenceName := "XML London 2014" Scala (Mutable) var conferenceName = "XML London 2014" conferenceName = "XML London 2015"
Strings val language = "Scala" s"XML Processing in $language" | XML Processing in Scala s"""An introduction to: |The "$language" programming language""".stripMargin | An introduction to: | The "Scala" programming language s"$language has ${language.length} chars in its name" | Scala has 5 chars in its name
Functions Scala def fun(x: Int, y: Double) = s"$x: $y" XQuery declare function local:fun( $x as xs:integer, $y as xs:double ) as xs:string { concat($x, ": ", $y) };
Everything is an expression val trainSpeed = if ( train.speed.mph >= 60 ) "Fast" else "Slow" def divide(numerator: Int, denominator: Int) = try { s"${numerator/denominator}" } catch { case _: java.lang.ArithmeticException => s"Cannot divide $numerator by $denominator" }
Types: Explicit def withTitle(name: String, title: String): String = s"$title. $name" val x: Int = { val y = 1000 100 + y } | x: Int = 1100
Functions: named parameters Further clarity in method calls: def makeLink(url: String, text: String) = s"""<a href="$url">$text</a>""" makeLink(text = "XML London 2014", url = "http://www.xmllondon.com") | <a href="http://www.xmllondon.com">XML London 2014</a>
Functions: default parameters Reduce repetition in method calls: def withTitle(name: String, title: String = "Mr") = s"$title. $name" withTitle("John Smith") | Mr. John Smith withTitle("Mary Smith", "Miss") | Miss. Mary Smith
Functional def incrementedByOne(x: Int) = x + 1 (1 to 5).map(incrementedByOne) | Vector(2, 3, 4, 5, 6)
Lambdas (1 to 5).map(x => x + 1) | Vector(2, 3, 4, 5, 6) (1 to 5).map(_ + 1) | Vector(2, 3, 4, 5, 6)
For comprehensions for { x <- (1 to 5) } yield x + 1 | Vector(2, 3, 4, 5, 6)
Implicit classes: Enrich types implicit class stringWrapper(str: String) { def wrapWithParens = s"($str)" } "Text".wrapWithParens | (Text)
Powerful features for scalability - Case classes - Traits - Partial functions - Pattern matching - Implicits - Flexible Syntax - Generics - User defined operators - Call-by-name - Macros
Values: Inline XML val url = "http://www.xmllondon.com" val title = "XML London 2014" val xmlTree = <div> <p>Welcome to <a href={url}>{title}</a>!</p> </div> | xmlTree: scala.xml.Elem = | <div> | <p>Welcome to <a href="http://www.xmllondon.com/">XML London 2014</a>!</p> | </div>
XML Lookups val listOfPeople = <people> <person>Fred</person> <person>Ron</person> <person>Nigel</person> </people> listOfPeople \ "person" | NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>) listOfPeople \ "_" | NodeSeq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)
XML Lookups val fact = <fact type="universal"> <variable>A</variable> = <variable>A</variable> </fact> fact \\ "variable" | NodeSeq(<variable>A</variable>, <variable>A</variable>) fact \ "@type" | : scala.xml.NodeSeq = universal fact \@ "type" | : String = universal
XML Loading val pun = """<pun rating="extreme"> | <question>Why do CompSci students need glasses?</question> | <answer>To C#<!-- C# is a Microsoft's programming language -->.</answer> |</pun>""".stripMargin scala.xml.XML.loadString(pun) | <pun rating="extreme"> | <question>Why do CompSci students need glasses?</question> | <answer>To C#.</answer> | </pun>
Collections: expressive val root = <numbers> {for {i <- 1 to 10} yield <number>{i}</number>} </numbers> val numbers = root \ "number" numbers(0) | <number>1</number> numbers.head | <number>1</number> numbers.last | <number>10</number> numbers take 3 | NodeSeq(<number>1</number>, <number>2</number>, <number>3</number>)
Collections: expressive numbers filter (_.text.toInt > 6) | NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>) numbers(_.text.toInt > 6) | NodeSeq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>) numbers maxBy (_.text) | <number>9</number> numbers maxBy (_.text.toInt) | <number>10</number> numbers.reverse | NodeSeq(<number>10</number>, <number>9</number>, <number>8</number>, <number>7</number>, <number>6</number>, <number>5</number>, <number>4</number>, <number>3</number>, <number>2</number>, <number>1</number>) numbers.groupBy(_.text.toInt % 3) | Map( | 2 -> NodeSeq(<number>2</number>, <number>5</number>, <number>8</number>), | 1 -> NodeSeq(<number>1</number>, <number>4</number>, <number>7</number>, <number>10</number>), | 0 -> NodeSeq(<number>3</number>, <number>6</number>, <number>9</number>))
XML Methods: a rich API ++ :\ andThen buildString companion copyToBuffer distinct endsWith flatten genericBuilder headOption inits isTraversableAgain lastIndexWhere max nameToString par product reduceRightOption sameElements seq sorted stringPrefix takeWhile toIndexedSeq toSet union xmlType zipWithIndex ++: \ apply canEqual compose corresponds doCollectNamespaces exists fold getNamespace indexOf intersect iterator lastOption maxBy namespace partition reduce repr scan size span sum text toIterable toStream unzip xml_!= +: \@ applyOrElse child contains count doTransform filter foldLeft groupBy indexOfSlice isAtom label length min nonEmpty patch reduceLeft reverse scanLeft slice splitAt tail theSeq toIterator toString unzip3 xml_== /: \\ asInstanceOf collect containsSlice descendant drop filterNot foldRight grouped indexWhere isDefinedAt last lengthCompare minBy nonEmptyChildren permutations reduceLeftOption reverseIterator scanRight sliding startsWith tails to toList toTraversable updated xml_sameElements /:\ addString attribute collectFirst copy descendant_or_self dropRight find forall hasDefiniteSize indices isEmpty lastIndexOf lift minimizeEmpty orElse prefix reduceOption reverseMap scope sortBy strict_!= take toArray toMap toVector view zip % :+ aggregate attributes combinations copyToArray diff dropWhile flatMap foreach head init isInstanceOf lastIndexOfSlice map mkString padTo prefixLength reduceRight runWith segmentLength sortWith strict_== takeRight toBuffer toSeq transpose withFilter zipAll
For-comprehensions: similar to XQuery <bib>{ <bib>{ for { for $b in $xml/book b <- xml \ "book" let $year := $b/@year year = b \@ "year" where $b/publisher = "Addison-Wesley" and if b \ "publisher" === "Addison-Wesley" && $year > 1991 year > 1991 return <book year="{ $year }"> } yield <book year={ year }> { $b/title } { b \ "title" } </book> </book> }</bib> }</bib>
For-comprehensions: similar to XQuery <bib>{ <bib>{ for { for $b in $xml/book b <- xml \ "book" let $year := $b/@year year = b \@ "year" where $b/publisher = "Addison-Wesley" and if b \ "publisher" === "Addison-Wesley" && $year > 1991 year > 1991 return <book year="{ $year }"> } yield <book year={ year }> { $b/title } { b \ "title" } </book> </book> }</bib> }</bib>
For-comprehensions: similar to XQuery <bib>{ <bib>{ for { for $b in $xml/book b <- xml \ "book" let $year := $b/@year year = b \@ "year" where $b/publisher = "Addison-Wesley" and if b \ "publisher" === "Addison-Wesley" && $year > 1991 year > 1991 return <book year="{ $year }"> } yield <book year={ year }> { $b/title } { b \ "title" } </book> </book> }</bib> }</bib>
For-comprehensions: similar to XQuery <bib>{ <bib>{ for { for $b in $xml/book b <- xml \ "book" let $year := $b/@year year = b \@ "year" where $b/publisher = "Addison-Wesley" and if b \ "publisher" === "Addison-Wesley" && $year > 1991 year > 1991 return <book year="{ $year }"> } yield <book year={ year }> { $b/title } { b \ "title" } </book> </book> }</bib> }</bib>
For-comprehensions: similar to XQuery <bib>{ <bib>{ for { for $b in $xml/book b <- xml \ "book" let $year := $b/@year year = b \@ "year" where $b/publisher = "Addison-Wesley" and if b \ "publisher" === "Addison-Wesley" && $year > 1991 year > 1991 return <book year="{ $year }"> } yield <book year={ year }> { $b/title } { b \ "title" } </book> </book> }</bib> }</bib>
For-comprehensions: similar to XQuery <bib>{ <bib>{ for { for $b in $xml/book b <- xml \ "book" let $year := $b/@year year = b \@ "year" where $b/publisher = "Addison-Wesley" and if b \ "publisher" === "Addison-Wesley" && $year > 1991 year > 1991 return <book year="{ $year }"> } yield <book year={ year }> { $b/title } { b \ "title" } </book> </book> }</bib> }</bib> Nice! ... yet is general purpose
Hybrid XML - XQuery for Scala - java.xml.* for free - Look up: XPath - Transform: XSLT - Stream: StAX
XQuery for Scala (XQS) - Wraps XQuery API for Java (javax.xml.xquery) - Scala access to XQuery in: - MarkLogic, BaseX, Saxon, Sedna, eXist, - Converts DOM to Scala XML & vice versa - http://github.com/fancellu/xqs
XQuery via XQS val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget> </widgets> import com.felstar.xqs.XQS._ val conn = new net.xqj.basex.local.BaseXXQDataSource().getConnection val nodes: NodeSeq = conn("for $w in /widgets/widget order by $w return $w", widgets) | NodeSeq(<widget>Menu</widget>, <widget id="panel-1">Panel</widget>, | <widget id="panel-2">Panel</widget>, <widget>Status bar</widget>)
XPath import com.felstar.xqs.XQS._ val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget> </widgets> val xpath = XPathFactory.newInstance().newXPath() val nodes = xpath.evaluate("/widgets/widget[not(@id)]", toDom(widgets), XPathConstants.NODESET).asInstanceOf[NodeList] (nodes: NodeSeq) | NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>) Natively in Scala: (widgets \ "widget")(widget => (widget \ "@id").isEmpty) | NodeSeq(<widget>Menu</widget>, <widget>Status bar</widget>)
XSLT val stylesheet = <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="john"> <xsl:copy>Hello, John.</xsl:copy> </xsl:template> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> import com.felstar.xqs.XQS._ val xmlResultResource = new java.io.StringWriter() val xmlTransformer = TransformerFactory.newInstance().newTransformer(stylesheet) xmlTransformer.transform(peopleXml, new StreamResult(xmlResultResource)) xmlResultResource.getBuffer | <?xml version="1.0" encoding="UTF-8"?><people> | <john>Hello, John.</john> | <smith>Smith is here.</smith> | <another>Hello.</another> | </people> val peopleXml = <people> <john>Hello, John.</john> <smith>Smith is here.</smith> <another>Hello.</another> </people>
XML Stream Processing // 4GB file, comes back in a second val src = Source.fromURL("http://dumps.wikimedia.org/enwiki/20140402/enwiki-20140402-abstract.xml") val er = XMLInputFactory.newInstance().createXMLEventReader(src.reader) implicit class XMLEventIterator(ev:XMLEventReader) extends scala.collection.Iterator[XMLEvent]{ def hasNext = ev.hasNext | 1: | 2: | | 3: | 4: | | 5: | 6: | 7: | 8: | | 9: | 10: <feed> def next = ev.nextEvent() } <doc> er.dropWhile(!_.isStartElement).take(10).zipWithIndex.foreach { <title> Wikipedia: Anarchism </title> case (ev, idx) => println(s"${idx+1}:\t$ev") } src.close() <url> http://en.wikipedia.org/wiki/An archism
Use Cases - Data extraction - Serving XML via REST - Dynamically generated XSLT - Interfacing with XML databases - Flexibility to choose the best tool for the job
Excellent Ecosystem SBT Akka Spark Spray Specs scalaz scala-xml shapeless Scaladin ScalaTest macro-paradise scala-maven-plugin JVM
Conclusion - Practical - Practical for XML processing
Where do I start? - atomicscala.com - typesafe.com/activator - scala-lang.org - scala-ide.org - IntelliJ
Matt Stephens Charles Foster
Open to consulting www.scala.contractors Follow us on Twitter: @DinoFancellu @ScalaWilliam @MaffStephens