module Linguist::BlobHelper

DEPRECATED Avoid mixing into Blob classes. Prefer functional interfaces like `Linguist.detect` over `Blob#language`. Functions are much easier to cache and compose.

Avoid adding additional bloat to this module.

BlobHelper is a mixin for Blobish classes that respond to “name”, “data” and “size” such as Grit::Blob.

Constants

DETECTABLE_TYPES
DocumentationRegexp
MEGABYTE
VendoredRegexp

Public Instance Methods

_mime_type() click to toggle source

Internal: Lookup mime type for extension.

Returns a MIME::Type

# File lib/linguist/blob_helper.rb, line 32
def _mime_type
  if defined? @_mime_type
    @_mime_type
  else
    guesses = ::MIME::Types.type_for(extname.to_s)

    # Prefer text mime types over binary
    @_mime_type = guesses.detect { |type| type.ascii? } ||
      # Otherwise use the first guess
      guesses.first
  end
end
binary?() click to toggle source

Public: Is the blob binary?

Return true or false

# File lib/linguist/blob_helper.rb, line 130
def binary?
  # Large blobs aren't even loaded into memory
  if data.nil?
    true

  # Treat blank files as text
  elsif data == ""
    false

  # Charlock doesn't know what to think
  elsif encoding.nil?
    true

  # If Charlock says its binary
  else
    detect_encoding[:type] == :binary
  end
end
binary_mime_type?() click to toggle source

Internal: Is the blob binary according to its mime type

Return true or false

# File lib/linguist/blob_helper.rb, line 60
def binary_mime_type?
  _mime_type ? _mime_type.binary? : false
end
content_type() click to toggle source

Public: Get the Content-Type header value

This value is used when serving raw blobs.

Examples

# => 'text/plain; charset=utf-8'
# => 'application/octet-stream'

Returns a content type String.

# File lib/linguist/blob_helper.rb, line 83
def content_type
  @content_type ||= (binary_mime_type? || binary?) ? mime_type :
    (encoding ? "text/plain; charset=#{encoding.downcase}" : "text/plain")
end
csv?() click to toggle source

Public: Is this blob a CSV file?

Return true or false

# File lib/linguist/blob_helper.rb, line 180
def csv?
  text? && extname.downcase == '.csv'
end
detect_encoding() click to toggle source

Try to guess the encoding

Returns: a Hash, with :encoding, :confidence, :type

this will return nil if an error occurred during detection or
no valid encoding could be found
# File lib/linguist/blob_helper.rb, line 123
def detect_encoding
  @detect_encoding ||= CharlockHolmes::EncodingDetector.new.detect(data) if data
end
disposition() click to toggle source

Public: Get the Content-Disposition header value

This value is used when serving raw blobs.

# => "attachment; filename=file.tar"
# => "inline"

Returns a content disposition String.

# File lib/linguist/blob_helper.rb, line 96
def disposition
  if text? || image?
    'inline'
  elsif name.nil?
    "attachment"
  else
    "attachment; filename=#{EscapeUtils.escape_url(name)}"
  end
end
documentation?() click to toggle source

Public: Is the blob in a documentation directory?

Documentation files are ignored by language statistics.

See “documentation.yml” for a list of documentation conventions that match this pattern.

Return true or false

# File lib/linguist/blob_helper.rb, line 250
def documentation?
  path =~ DocumentationRegexp ? true : false
end
empty?() click to toggle source

Public: Is the blob empty?

Return true or false

# File lib/linguist/blob_helper.rb, line 152
def empty?
  data.nil? || data == ""
end
encoding() click to toggle source
# File lib/linguist/blob_helper.rb, line 106
def encoding
  if hash = detect_encoding
    hash[:encoding]
  end
end
extname() click to toggle source

Public: Get the extname of the path

Examples

blob(name='foo.rb').extname
# => '.rb'

Returns a String

# File lib/linguist/blob_helper.rb, line 25
def extname
  File.extname(name.to_s)
end
generated?() click to toggle source

Public: Is the blob a generated file?

Generated source code is suppressed in diffs and is ignored by language statistics.

May load Linguist::Blob#data

Return true or false

# File lib/linguist/blob_helper.rb, line 318
def generated?
  @_generated ||= Generated.generated?(path, lambda { data })
end
high_ratio_of_long_lines?() click to toggle source

Internal: Does the blob have a ratio of long lines?

Return true or false

# File lib/linguist/blob_helper.rb, line 210
def high_ratio_of_long_lines?
  return false if loc == 0
  size / loc > 5000
end
image?() click to toggle source

Public: Is the blob a supported image format?

Return true or false

# File lib/linguist/blob_helper.rb, line 166
def image?
  ['.png', '.jpg', '.jpeg', '.gif'].include?(extname.downcase)
end
include_in_language_stats?() click to toggle source

Internal: Should this blob be included in repository language statistics?

# File lib/linguist/blob_helper.rb, line 339
def include_in_language_stats?
  !vendored? &&
  !documentation? &&
  !generated? &&
  language && DETECTABLE_TYPES.include?(language.type)
end
language() click to toggle source

Public: Detects the Language of the blob.

May load Linguist::Blob#data

Returns a Language or nil if none is detected

# File lib/linguist/blob_helper.rb, line 327
def language
  @language ||= Linguist.detect(self)
end
large?() click to toggle source

Public: Is the blob too big to load?

Return true or false

# File lib/linguist/blob_helper.rb, line 196
def large?
  size.to_i > MEGABYTE
end
likely_binary?() click to toggle source

Internal: Is the blob binary according to its mime type, overriding it if we have better data from the languages.yml database.

Return true or false

# File lib/linguist/blob_helper.rb, line 69
def likely_binary?
  binary_mime_type? && !Language.find_by_filename(name)
end
lines() click to toggle source

Public: Get each line of data

Requires Linguist::Blob#data

Returns an Array of lines

# File lib/linguist/blob_helper.rb, line 259
def lines
  @lines ||=
    if viewable? && data
      # `data` is usually encoded as ASCII-8BIT even when the content has
      # been detected as a different encoding. However, we are not allowed
      # to change the encoding of `data` because we've made the implicit
      # guarantee that each entry in `lines` is encoded the same way as
      # `data`.
      #
      # Instead, we re-encode each possible newline sequence as the
      # detected encoding, then force them back to the encoding of `data`
      # (usually a binary encoding like ASCII-8BIT). This means that the
      # byte sequence will match how newlines are likely encoded in the
      # file, but we don't have to change the encoding of `data` as far as
      # Ruby is concerned. This allows us to correctly parse out each line
      # without changing the encoding of `data`, and
      # also--importantly--without having to duplicate many (potentially
      # large) strings.
      begin
        encoded_newlines = ["\r\n", "\r", "\n"].
          map { |nl| nl.encode(ruby_encoding, "ASCII-8BIT").force_encoding(data.encoding) }

        data.split(Regexp.union(encoded_newlines), -1)
      rescue Encoding::ConverterNotFoundError
        # The data is not splittable in the detected encoding.  Assume it's
        # one big line.
        [data]
      end
    else
      []
    end
end
loc() click to toggle source

Public: Get number of lines of code

Requires Linguist::Blob#data

Returns Integer

# File lib/linguist/blob_helper.rb, line 297
def loc
  lines.size
end
mime_type() click to toggle source

Public: Get the actual blob mime type

Examples

# => 'text/plain'
# => 'text/html'

Returns a mime type String.

# File lib/linguist/blob_helper.rb, line 53
def mime_type
  _mime_type ? _mime_type.to_s : 'text/plain'
end
pdf?() click to toggle source

Public: Is the blob a PDF?

Return true or false

# File lib/linguist/blob_helper.rb, line 187
def pdf?
  extname.downcase == '.pdf'
end
ruby_encoding() click to toggle source
# File lib/linguist/blob_helper.rb, line 112
def ruby_encoding
  if hash = detect_encoding
    hash[:ruby_encoding]
  end
end
safe_to_colorize?() click to toggle source

Public: Is the blob safe to colorize?

Return true or false

# File lib/linguist/blob_helper.rb, line 203
def safe_to_colorize?
  !large? && text? && !high_ratio_of_long_lines?
end
sloc() click to toggle source

Public: Get number of source lines of code

Requires Linguist::Blob#data

Returns Integer

# File lib/linguist/blob_helper.rb, line 306
def sloc
  lines.grep(/\S/).size
end
solid?() click to toggle source

Public: Is the blob a supported 3D model format?

Return true or false

# File lib/linguist/blob_helper.rb, line 173
def solid?
  extname.downcase == '.stl'
end
text?() click to toggle source

Public: Is the blob text?

Return true or false

# File lib/linguist/blob_helper.rb, line 159
def text?
  !binary?
end
tm_scope() click to toggle source

Internal: Get the TextMate compatible scope for the blob

# File lib/linguist/blob_helper.rb, line 332
def tm_scope
  language && language.tm_scope
end
vendored?() click to toggle source

Public: Is the blob in a vendored directory?

Vendored files are ignored by language statistics.

See “vendor.yml” for a list of vendored conventions that match this pattern.

Return true or false

# File lib/linguist/blob_helper.rb, line 235
def vendored?
  path =~ VendoredRegexp ? true : false
end
viewable?() click to toggle source

Public: Is the blob viewable?

Non-viewable blobs will just show a “View Raw” link

Return true or false

# File lib/linguist/blob_helper.rb, line 220
def viewable?
  !large? && text?
end