Browse code

Added RubyPants (the ruby port of John Gruber's smarty pants) to intelligently replace primes with smart quotes in the pullquote plugin, fixes #316

Brandon Mathis authored on 11/12/2011 at 22:20:04
Showing 2 changed files
... ...
@@ -4,7 +4,7 @@
4 4
 #
5 5
 # Outputs a span with a data-pullquote attribute set from the marked pullquote. Example:
6 6
 #
7
-#   {% pullquote %} 
7
+#   {% pullquote %}
8 8
 #     When writing longform posts, I find it helpful to include pullquotes, which help those scanning a post discern whether or not a post is helpful.
9 9
 #     It is important to note, {" pullquotes are merely visual in presentation and should not appear twice in the text. "} That is why it is prefered
10 10
 #     to use a CSS only technique for styling pullquotes.
... ...
@@ -33,7 +33,9 @@ module Jekyll
33 33
     def render(context)
34 34
       output = super
35 35
       if output.join =~ /\{"\s*(.+)\s*"\}/
36
-        @quote = $1
36
+        #@quote = $1
37
+        @quote = RubyPants.new($1).to_html
38
+        #@quote = CGI.escape($1)
37 39
         "<span class='pullquote-#{@align}' data-pullquote='#{@quote}'>#{output.join.gsub(/\{"\s*|\s*"\}/, '')}</span>"
38 40
       else
39 41
         return "Surround your pullquote like this {\" text to be quoted \"}"
40 42
new file mode 100644
... ...
@@ -0,0 +1,489 @@
0
+#
1
+# = RubyPants -- SmartyPants ported to Ruby
2
+#
3
+# Ported by Christian Neukirchen <mailto:chneukirchen@gmail.com>
4
+#   Copyright (C) 2004 Christian Neukirchen
5
+#
6
+# Incooporates ideas, comments and documentation by Chad Miller
7
+#   Copyright (C) 2004 Chad Miller
8
+#
9
+# Original SmartyPants by John Gruber
10
+#   Copyright (C) 2003 John Gruber
11
+#
12
+
13
+#
14
+# = RubyPants -- SmartyPants ported to Ruby
15
+#
16
+# == Synopsis
17
+#
18
+# RubyPants is a Ruby port of the smart-quotes library SmartyPants.
19
+#
20
+# The original "SmartyPants" is a free web publishing plug-in for
21
+# Movable Type, Blosxom, and BBEdit that easily translates plain ASCII
22
+# punctuation characters into "smart" typographic punctuation HTML
23
+# entities.
24
+#
25
+#
26
+# == Description
27
+#
28
+# RubyPants can perform the following transformations:
29
+#
30
+# * Straight quotes (<tt>"</tt> and <tt>'</tt>) into "curly" quote
31
+#   HTML entities
32
+# * Backticks-style quotes (<tt>``like this''</tt>) into "curly" quote
33
+#   HTML entities
34
+# * Dashes (<tt>--</tt> and <tt>---</tt>) into en- and em-dash
35
+#   entities
36
+# * Three consecutive dots (<tt>...</tt> or <tt>. . .</tt>) into an
37
+#   ellipsis entity
38
+#
39
+# This means you can write, edit, and save your posts using plain old
40
+# ASCII straight quotes, plain dashes, and plain dots, but your
41
+# published posts (and final HTML output) will appear with smart
42
+# quotes, em-dashes, and proper ellipses.
43
+#
44
+# RubyPants does not modify characters within <tt><pre></tt>,
45
+# <tt><code></tt>, <tt><kbd></tt>, <tt><math></tt> or
46
+# <tt><script></tt> tag blocks. Typically, these tags are used to
47
+# display text where smart quotes and other "smart punctuation" would
48
+# not be appropriate, such as source code or example markup.
49
+#
50
+#
51
+# == Backslash Escapes
52
+#
53
+# If you need to use literal straight quotes (or plain hyphens and
54
+# periods), RubyPants accepts the following backslash escape sequences
55
+# to force non-smart punctuation. It does so by transforming the
56
+# escape sequence into a decimal-encoded HTML entity:
57
+#
58
+#   \\    \"    \'    \.    \-    \`
59
+#
60
+# This is useful, for example, when you want to use straight quotes as
61
+# foot and inch marks: 6'2" tall; a 17" iMac.  (Use <tt>6\'2\"</tt>
62
+# resp. <tt>17\"</tt>.)
63
+#
64
+#
65
+# == Algorithmic Shortcomings
66
+#
67
+# One situation in which quotes will get curled the wrong way is when
68
+# apostrophes are used at the start of leading contractions. For
69
+# example:
70
+#
71
+#   'Twas the night before Christmas.
72
+#
73
+# In the case above, RubyPants will turn the apostrophe into an
74
+# opening single-quote, when in fact it should be a closing one. I
75
+# don't think this problem can be solved in the general case--every
76
+# word processor I've tried gets this wrong as well. In such cases,
77
+# it's best to use the proper HTML entity for closing single-quotes
78
+# ("<tt>&#8217;</tt>") by hand.
79
+#
80
+#
81
+# == Bugs
82
+#
83
+# To file bug reports or feature requests (except see above) please
84
+# send email to: mailto:chneukirchen@gmail.com
85
+#
86
+# If the bug involves quotes being curled the wrong way, please send
87
+# example text to illustrate.
88
+#
89
+#
90
+# == Authors
91
+#
92
+# John Gruber did all of the hard work of writing this software in
93
+# Perl for Movable Type and almost all of this useful documentation.
94
+# Chad Miller ported it to Python to use with Pyblosxom.
95
+#
96
+# Christian Neukirchen provided the Ruby port, as a general-purpose
97
+# library that follows the *Cloth API.
98
+#
99
+#
100
+# == Copyright and License
101
+#
102
+# === SmartyPants license:
103
+#
104
+# Copyright (c) 2003 John Gruber
105
+# (http://daringfireball.net)
106
+# All rights reserved.
107
+#
108
+# Redistribution and use in source and binary forms, with or without
109
+# modification, are permitted provided that the following conditions
110
+# are met:
111
+#
112
+# * Redistributions of source code must retain the above copyright
113
+#   notice, this list of conditions and the following disclaimer.
114
+#
115
+# * Redistributions in binary form must reproduce the above copyright
116
+#   notice, this list of conditions and the following disclaimer in
117
+#   the documentation and/or other materials provided with the
118
+#   distribution.
119
+#
120
+# * Neither the name "SmartyPants" nor the names of its contributors
121
+#   may be used to endorse or promote products derived from this
122
+#   software without specific prior written permission.
123
+#
124
+# This software is provided by the copyright holders and contributors
125
+# "as is" and any express or implied warranties, including, but not
126
+# limited to, the implied warranties of merchantability and fitness
127
+# for a particular purpose are disclaimed. In no event shall the
128
+# copyright owner or contributors be liable for any direct, indirect,
129
+# incidental, special, exemplary, or consequential damages (including,
130
+# but not limited to, procurement of substitute goods or services;
131
+# loss of use, data, or profits; or business interruption) however
132
+# caused and on any theory of liability, whether in contract, strict
133
+# liability, or tort (including negligence or otherwise) arising in
134
+# any way out of the use of this software, even if advised of the
135
+# possibility of such damage.
136
+#
137
+# === RubyPants license
138
+#
139
+# RubyPants is a derivative work of SmartyPants and smartypants.py.
140
+#
141
+# Redistribution and use in source and binary forms, with or without
142
+# modification, are permitted provided that the following conditions
143
+# are met:
144
+#
145
+# * Redistributions of source code must retain the above copyright
146
+#   notice, this list of conditions and the following disclaimer.
147
+#
148
+# * Redistributions in binary form must reproduce the above copyright
149
+#   notice, this list of conditions and the following disclaimer in
150
+#   the documentation and/or other materials provided with the
151
+#   distribution.
152
+#
153
+# This software is provided by the copyright holders and contributors
154
+# "as is" and any express or implied warranties, including, but not
155
+# limited to, the implied warranties of merchantability and fitness
156
+# for a particular purpose are disclaimed. In no event shall the
157
+# copyright owner or contributors be liable for any direct, indirect,
158
+# incidental, special, exemplary, or consequential damages (including,
159
+# but not limited to, procurement of substitute goods or services;
160
+# loss of use, data, or profits; or business interruption) however
161
+# caused and on any theory of liability, whether in contract, strict
162
+# liability, or tort (including negligence or otherwise) arising in
163
+# any way out of the use of this software, even if advised of the
164
+# possibility of such damage.
165
+#
166
+#
167
+# == Links
168
+#
169
+# John Gruber:: http://daringfireball.net
170
+# SmartyPants:: http://daringfireball.net/projects/smartypants
171
+#
172
+# Chad Miller:: http://web.chad.org
173
+#
174
+# Christian Neukirchen:: http://kronavita.de/chris
175
+#
176
+
177
+
178
+class RubyPants < String
179
+
180
+  # Create a new RubyPants instance with the text in +string+.
181
+  #
182
+  # Allowed elements in the options array:
183
+  #
184
+  # 0  :: do nothing
185
+  # 1  :: enable all, using only em-dash shortcuts
186
+  # 2  :: enable all, using old school en- and em-dash shortcuts (*default*)
187
+  # 3  :: enable all, using inverted old school en and em-dash shortcuts
188
+  # -1 :: stupefy (translate HTML entities to their ASCII-counterparts)
189
+  #
190
+  # If you don't like any of these defaults, you can pass symbols to change
191
+  # RubyPants' behavior:
192
+  #
193
+  # <tt>:quotes</tt>        :: quotes
194
+  # <tt>:backticks</tt>     :: backtick quotes (``double'' only)
195
+  # <tt>:allbackticks</tt>  :: backtick quotes (``double'' and `single')
196
+  # <tt>:dashes</tt>        :: dashes
197
+  # <tt>:oldschool</tt>     :: old school dashes
198
+  # <tt>:inverted</tt>      :: inverted old school dashes
199
+  # <tt>:ellipses</tt>      :: ellipses
200
+  # <tt>:convertquotes</tt> :: convert <tt>&quot;</tt> entities to
201
+  #                            <tt>"</tt> for Dreamweaver users
202
+  # <tt>:stupefy</tt>       :: translate RubyPants HTML entities
203
+  #                            to their ASCII counterparts.
204
+  #
205
+  def initialize(string, options=[2])
206
+    super string
207
+    @options = [*options]
208
+  end
209
+
210
+  # Apply SmartyPants transformations.
211
+  def to_html
212
+    do_quotes = do_backticks = do_dashes = do_ellipses = do_stupify = nil
213
+    convert_quotes = false
214
+
215
+    if @options.include? 0
216
+      # Do nothing.
217
+      return self
218
+    elsif @options.include? 1
219
+      # Do everything, turn all options on.
220
+      do_quotes = do_backticks = do_ellipses = true
221
+      do_dashes = :normal
222
+    elsif @options.include? 2
223
+      # Do everything, turn all options on, use old school dash shorthand.
224
+      do_quotes = do_backticks = do_ellipses = true
225
+      do_dashes = :oldschool
226
+    elsif @options.include? 3
227
+      # Do everything, turn all options on, use inverted old school
228
+      # dash shorthand.
229
+      do_quotes = do_backticks = do_ellipses = true
230
+      do_dashes = :inverted
231
+    elsif @options.include?(-1)
232
+      do_stupefy = true
233
+    else
234
+      do_quotes =                @options.include? :quotes
235
+      do_backticks =             @options.include? :backticks
236
+      do_backticks = :both    if @options.include? :allbackticks
237
+      do_dashes = :normal     if @options.include? :dashes
238
+      do_dashes = :oldschool  if @options.include? :oldschool
239
+      do_dashes = :inverted   if @options.include? :inverted
240
+      do_ellipses =              @options.include? :ellipses
241
+      convert_quotes =           @options.include? :convertquotes
242
+      do_stupefy =               @options.include? :stupefy
243
+    end
244
+
245
+    # Parse the HTML
246
+    tokens = tokenize
247
+
248
+    # Keep track of when we're inside <pre> or <code> tags.
249
+    in_pre = false
250
+
251
+    # Here is the result stored in.
252
+    result = ""
253
+
254
+    # This is a cheat, used to get some context for one-character
255
+    # tokens that consist of just a quote char. What we do is remember
256
+    # the last character of the previous text token, to use as context
257
+    # to curl single- character quote tokens correctly.
258
+    prev_token_last_char = nil
259
+
260
+    tokens.each { |token|
261
+      if token.first == :tag
262
+        result << token[1]
263
+        if token[1] =~ %r!<(/?)(?:pre|code|kbd|script|math)[\s>]!
264
+          in_pre = ($1 != "/")  # Opening or closing tag?
265
+        end
266
+      else
267
+        t = token[1]
268
+
269
+        # Remember last char of this token before processing.
270
+        last_char = t[-1].chr
271
+
272
+        unless in_pre
273
+          t = process_escapes t
274
+
275
+          t.gsub!(/&quot;/, '"')  if convert_quotes
276
+
277
+          if do_dashes
278
+            t = educate_dashes t            if do_dashes == :normal
279
+            t = educate_dashes_oldschool t  if do_dashes == :oldschool
280
+            t = educate_dashes_inverted t   if do_dashes == :inverted
281
+          end
282
+
283
+          t = educate_ellipses t  if do_ellipses
284
+
285
+          # Note: backticks need to be processed before quotes.
286
+          if do_backticks
287
+            t = educate_backticks t
288
+            t = educate_single_backticks t  if do_backticks == :both
289
+          end
290
+
291
+          if do_quotes
292
+            if t == "'"
293
+              # Special case: single-character ' token
294
+              if prev_token_last_char =~ /\S/
295
+                t = "&#8217;"
296
+              else
297
+                t = "&#8216;"
298
+              end
299
+            elsif t == '"'
300
+              # Special case: single-character " token
301
+              if prev_token_last_char =~ /\S/
302
+                t = "&#8221;"
303
+              else
304
+                t = "&#8220;"
305
+              end
306
+            else
307
+              # Normal case:
308
+              t = educate_quotes t
309
+            end
310
+          end
311
+
312
+          t = stupefy_entities t  if do_stupefy
313
+        end
314
+
315
+        prev_token_last_char = last_char
316
+        result << t
317
+      end
318
+    }
319
+
320
+    # Done
321
+    result
322
+  end
323
+
324
+  protected
325
+
326
+  # Return the string, with after processing the following backslash
327
+  # escape sequences. This is useful if you want to force a "dumb" quote
328
+  # or other character to appear.
329
+  #
330
+  # Escaped are:
331
+  #      \\    \"    \'    \.    \-    \`
332
+  #
333
+  def process_escapes(str)
334
+    str.gsub('\\\\', '&#92;').
335
+      gsub('\"', '&#34;').
336
+      gsub("\\\'", '&#39;').
337
+      gsub('\.', '&#46;').
338
+      gsub('\-', '&#45;').
339
+      gsub('\`', '&#96;')
340
+  end
341
+
342
+  # The string, with each instance of "<tt>--</tt>" translated to an
343
+  # em-dash HTML entity.
344
+  #
345
+  def educate_dashes(str)
346
+    str.gsub(/--/, '&#8212;')
347
+  end
348
+
349
+  # The string, with each instance of "<tt>--</tt>" translated to an
350
+  # en-dash HTML entity, and each "<tt>---</tt>" translated to an
351
+  # em-dash HTML entity.
352
+  #
353
+  def educate_dashes_oldschool(str)
354
+    str.gsub(/---/, '&#8212;').gsub(/--/, '&#8211;')
355
+  end
356
+
357
+  # Return the string, with each instance of "<tt>--</tt>" translated
358
+  # to an em-dash HTML entity, and each "<tt>---</tt>" translated to
359
+  # an en-dash HTML entity. Two reasons why: First, unlike the en- and
360
+  # em-dash syntax supported by +educate_dashes_oldschool+, it's
361
+  # compatible with existing entries written before SmartyPants 1.1,
362
+  # back when "<tt>--</tt>" was only used for em-dashes.  Second,
363
+  # em-dashes are more common than en-dashes, and so it sort of makes
364
+  # sense that the shortcut should be shorter to type. (Thanks to
365
+  # Aaron Swartz for the idea.)
366
+  #
367
+  def educate_dashes_inverted(str)
368
+    str.gsub(/---/, '&#8211;').gsub(/--/, '&#8212;')
369
+  end
370
+
371
+  # Return the string, with each instance of "<tt>...</tt>" translated
372
+  # to an ellipsis HTML entity. Also converts the case where there are
373
+  # spaces between the dots.
374
+  #
375
+  def educate_ellipses(str)
376
+    str.gsub('...', '&#8230;').gsub('. . .', '&#8230;')
377
+  end
378
+
379
+  # Return the string, with "<tt>``backticks''</tt>"-style single quotes
380
+  # translated into HTML curly quote entities.
381
+  #
382
+  def educate_backticks(str)
383
+    str.gsub("``", '&#8220;').gsub("''", '&#8221;')
384
+  end
385
+
386
+  # Return the string, with "<tt>`backticks'</tt>"-style single quotes
387
+  # translated into HTML curly quote entities.
388
+  #
389
+  def educate_single_backticks(str)
390
+    str.gsub("`", '&#8216;').gsub("'", '&#8217;')
391
+  end
392
+
393
+  # Return the string, with "educated" curly quote HTML entities.
394
+  #
395
+  def educate_quotes(str)
396
+    punct_class = '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^_`{|}~]'
397
+
398
+    str = str.dup
399
+
400
+    # Special case if the very first character is a quote followed by
401
+    # punctuation at a non-word-break. Close the quotes by brute
402
+    # force:
403
+    str.gsub!(/^'(?=#{punct_class}\B)/, '&#8217;')
404
+    str.gsub!(/^"(?=#{punct_class}\B)/, '&#8221;')
405
+
406
+    # Special case for double sets of quotes, e.g.:
407
+    #   <p>He said, "'Quoted' words in a larger quote."</p>
408
+    str.gsub!(/"'(?=\w)/, '&#8220;&#8216;')
409
+    str.gsub!(/'"(?=\w)/, '&#8216;&#8220;')
410
+
411
+    # Special case for decade abbreviations (the '80s):
412
+    str.gsub!(/'(?=\d\ds)/, '&#8217;')
413
+
414
+    close_class = %![^\ \t\r\n\\[\{\(\-]!
415
+    dec_dashes = '&#8211;|&#8212;'
416
+
417
+    # Get most opening single quotes:
418
+    str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)'(?=\w)/,
419
+             '\1&#8216;')
420
+    # Single closing quotes:
421
+    str.gsub!(/(#{close_class})'/, '\1&#8217;')
422
+    str.gsub!(/'(\s|s\b|$)/, '&#8217;\1')
423
+    # Any remaining single quotes should be opening ones:
424
+    str.gsub!(/'/, '&#8216;')
425
+
426
+    # Get most opening double quotes:
427
+    str.gsub!(/(\s|&nbsp;|--|&[mn]dash;|#{dec_dashes}|&#x201[34];)"(?=\w)/,
428
+             '\1&#8220;')
429
+    # Double closing quotes:
430
+    str.gsub!(/(#{close_class})"/, '\1&#8221;')
431
+    str.gsub!(/"(\s|s\b|$)/, '&#8221;\1')
432
+    # Any remaining quotes should be opening ones:
433
+    str.gsub!(/"/, '&#8220;')
434
+
435
+    str
436
+  end
437
+
438
+  # Return the string, with each RubyPants HTML entity translated to
439
+  # its ASCII counterpart.
440
+  #
441
+  # Note: This is not reversible (but exactly the same as in SmartyPants)
442
+  #
443
+  def stupefy_entities(str)
444
+    str.
445
+      gsub(/&#8211;/, '-').      # en-dash
446
+      gsub(/&#8212;/, '--').     # em-dash
447
+
448
+      gsub(/&#8216;/, "'").      # open single quote
449
+      gsub(/&#8217;/, "'").      # close single quote
450
+
451
+      gsub(/&#8220;/, '"').      # open double quote
452
+      gsub(/&#8221;/, '"').      # close double quote
453
+
454
+      gsub(/&#8230;/, '...')     # ellipsis
455
+  end
456
+
457
+  # Return an array of the tokens comprising the string. Each token is
458
+  # either a tag (possibly with nested, tags contained therein, such
459
+  # as <tt><a href="<MTFoo>"></tt>, or a run of text between
460
+  # tags. Each element of the array is a two-element array; the first
461
+  # is either :tag or :text; the second is the actual value.
462
+  #
463
+  # Based on the <tt>_tokenize()</tt> subroutine from Brad Choate's
464
+  # MTRegex plugin.  <http://www.bradchoate.com/past/mtregex.php>
465
+  #
466
+  # This is actually the easier variant using tag_soup, as used by
467
+  # Chad Miller in the Python port of SmartyPants.
468
+  #
469
+  def tokenize
470
+    tag_soup = /([^<]*)(<[^>]*>)/
471
+
472
+    tokens = []
473
+
474
+    prev_end = 0
475
+    scan(tag_soup) {
476
+      tokens << [:text, $1]  if $1 != ""
477
+      tokens << [:tag, $2]
478
+
479
+      prev_end = $~.end(0)
480
+    }
481
+
482
+    if prev_end < size
483
+      tokens << [:text, self[prev_end..-1]]
484
+    end
485
+
486
+    tokens
487
+  end
488
+end